Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 96432 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 7.4 MiB |
| Average record size in memory | 80.0 B |
Variable types
| DateTime | 1 |
|---|---|
| Numeric | 9 |
maxtempC is highly correlated with mintempC and 4 other fields | High correlation |
mintempC is highly correlated with maxtempC and 3 other fields | High correlation |
cloudcover is highly correlated with maxtempC and 2 other fields | High correlation |
humidity is highly correlated with maxtempC and 2 other fields | High correlation |
sunHour is highly correlated with maxtempC and 2 other fields | High correlation |
HeatIndexC is highly correlated with maxtempC and 3 other fields | High correlation |
pressure is highly correlated with mintempC and 1 other fields | High correlation |
date_time has unique values | Unique |
cloudcover has 6533 (6.8%) zeros | Zeros |
precipMM has 84604 (87.7%) zeros | Zeros |
Reproduction
| Analysis started | 2022-10-14 16:31:22.146809 |
|---|---|
| Analysis finished | 2022-10-14 16:31:50.006657 |
| Duration | 27.86 seconds |
| Software version | pandas-profiling v3.3.0 |
| Download configuration | config.json |
| Distinct | 96432 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 753.5 KiB |
| Minimum | 2009-01-01 00:00:00 |
|---|---|
| Maximum | 2020-01-01 23:00:00 |
Histogram with fixed size bins (bins=50)
| Distinct | 23 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.64609258 |
| Minimum | 18 |
|---|---|
| Maximum | 40 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 753.5 KiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 25 |
| Q1 | 27 |
| median | 29 |
| Q3 | 32 |
| 95-th percentile | 36 |
| Maximum | 40 |
| Range | 22 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 3.44642703 |
|---|---|
| Coefficient of variation (CV) | 0.1162523196 |
| Kurtosis | -0.139291169 |
| Mean | 29.64609258 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.4274565427 |
| Sum | 2858832 |
| Variance | 11.87785927 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=23)
| Value | Count | Frequency (%) |
| 28 | 14976 | |
| 29 | 13296 | |
| 27 | 11160 | |
| 30 | 9384 | |
| 26 | 7968 | |
| 31 | 6456 | |
| 34 | 5088 | 5.3% |
| 35 | 5016 | 5.2% |
| 33 | 4896 | 5.1% |
| 32 | 4392 | 4.6% |
| Other values (13) | 13800 |
| Value | Count | Frequency (%) |
| 18 | 24 | < 0.1% |
| 19 | 72 | 0.1% |
| 20 | 96 | 0.1% |
| 21 | 216 | 0.2% |
| 22 | 816 | 0.8% |
| 23 | 600 | 0.6% |
| 24 | 1776 | 1.8% |
| 25 | 3792 | 3.9% |
| 26 | 7968 | |
| 27 | 11160 |
| Value | Count | Frequency (%) |
| 40 | 240 | 0.2% |
| 39 | 336 | 0.3% |
| 38 | 672 | 0.7% |
| 37 | 1680 | 1.7% |
| 36 | 3480 | |
| 35 | 5016 | |
| 34 | 5088 | |
| 33 | 4896 | |
| 32 | 4392 | |
| 31 | 6456 |
| Distinct | 18 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19.33673469 |
| Minimum | 11 |
|---|---|
| Maximum | 28 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 753.5 KiB |
Quantile statistics
| Minimum | 11 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 18 |
| median | 20 |
| Q3 | 21 |
| 95-th percentile | 24 |
| Maximum | 28 |
| Range | 17 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.77377135 |
|---|---|
| Coefficient of variation (CV) | 0.1434456951 |
| Kurtosis | -0.05055967691 |
| Mean | 19.33673469 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.2244711172 |
| Sum | 1864680 |
| Variance | 7.6938075 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=18)
| Value | Count | Frequency (%) |
| 20 | 18384 | |
| 19 | 14424 | |
| 21 | 12360 | |
| 18 | 9240 | |
| 22 | 8376 | |
| 17 | 6864 | 7.1% |
| 23 | 5304 | 5.5% |
| 16 | 5232 | 5.4% |
| 15 | 4896 | 5.1% |
| 14 | 3312 | 3.4% |
| Other values (8) | 8040 |
| Value | Count | Frequency (%) |
| 11 | 48 | < 0.1% |
| 12 | 504 | 0.5% |
| 13 | 1776 | 1.8% |
| 14 | 3312 | 3.4% |
| 15 | 4896 | 5.1% |
| 16 | 5232 | 5.4% |
| 17 | 6864 | 7.1% |
| 18 | 9240 | |
| 19 | 14424 | |
| 20 | 18384 |
| Value | Count | Frequency (%) |
| 28 | 48 | < 0.1% |
| 27 | 240 | 0.2% |
| 26 | 552 | 0.6% |
| 25 | 1824 | 1.9% |
| 24 | 3048 | 3.2% |
| 23 | 5304 | 5.5% |
| 22 | 8376 | |
| 21 | 12360 | |
| 20 | 18384 | |
| 19 | 14424 |
| Distinct | 101 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 34.84748839 |
| Minimum | 0 |
|---|---|
| Maximum | 100 |
| Zeros | 6533 |
| Zeros (%) | 6.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 753.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 9 |
| median | 29 |
| Q3 | 54 |
| 95-th percentile | 90 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 45 |
Descriptive statistics
| Standard deviation | 28.39102052 |
|---|---|
| Coefficient of variation (CV) | 0.814722146 |
| Kurtosis | -0.6143135663 |
| Mean | 34.84748839 |
| Median Absolute Deviation (MAD) | 21 |
| Skewness | 0.6347253774 |
| Sum | 3360413 |
| Variance | 806.0500463 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 6533 | 6.8% |
| 100 | 2868 | 3.0% |
| 4 | 2457 | 2.5% |
| 5 | 2320 | 2.4% |
| 3 | 2258 | 2.3% |
| 6 | 2131 | 2.2% |
| 7 | 1900 | 2.0% |
| 2 | 1810 | 1.9% |
| 8 | 1682 | 1.7% |
| 9 | 1621 | 1.7% |
| Other values (91) | 70852 |
| Value | Count | Frequency (%) |
| 0 | 6533 | |
| 1 | 1449 | 1.5% |
| 2 | 1810 | 1.9% |
| 3 | 2258 | 2.3% |
| 4 | 2457 | 2.5% |
| 5 | 2320 | 2.4% |
| 6 | 2131 | 2.2% |
| 7 | 1900 | 2.0% |
| 8 | 1682 | 1.7% |
| 9 | 1621 | 1.7% |
| Value | Count | Frequency (%) |
| 100 | 2868 | |
| 99 | 174 | 0.2% |
| 98 | 188 | 0.2% |
| 97 | 181 | 0.2% |
| 96 | 219 | 0.2% |
| 95 | 221 | 0.2% |
| 94 | 208 | 0.2% |
| 93 | 240 | 0.2% |
| 92 | 254 | 0.3% |
| 91 | 264 | 0.3% |
| Distinct | 95 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 64.89546001 |
| Minimum | 6 |
|---|---|
| Maximum | 100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 753.5 KiB |
Quantile statistics
| Minimum | 6 |
|---|---|
| 5-th percentile | 25 |
| Q1 | 49 |
| median | 68 |
| Q3 | 83 |
| 95-th percentile | 95 |
| Maximum | 100 |
| Range | 94 |
| Interquartile range (IQR) | 34 |
Descriptive statistics
| Standard deviation | 21.8568693 |
|---|---|
| Coefficient of variation (CV) | 0.3368012077 |
| Kurtosis | -0.7671365746 |
| Mean | 64.89546001 |
| Median Absolute Deviation (MAD) | 17 |
| Skewness | -0.4261100313 |
| Sum | 6257999 |
| Variance | 477.7227358 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 83 | 1757 | 1.8% |
| 86 | 1754 | 1.8% |
| 88 | 1751 | 1.8% |
| 85 | 1739 | 1.8% |
| 84 | 1705 | 1.8% |
| 82 | 1681 | 1.7% |
| 87 | 1677 | 1.7% |
| 80 | 1670 | 1.7% |
| 79 | 1643 | 1.7% |
| 81 | 1639 | 1.7% |
| Other values (85) | 79416 |
| Value | Count | Frequency (%) |
| 6 | 1 | < 0.1% |
| 7 | 3 | < 0.1% |
| 8 | 9 | < 0.1% |
| 9 | 26 | < 0.1% |
| 10 | 58 | 0.1% |
| 11 | 60 | 0.1% |
| 12 | 122 | |
| 13 | 140 | |
| 14 | 156 | |
| 15 | 226 |
| Value | Count | Frequency (%) |
| 100 | 117 | 0.1% |
| 99 | 697 | |
| 98 | 1315 | |
| 97 | 1320 | |
| 96 | 1244 | |
| 95 | 1297 | |
| 94 | 1332 | |
| 93 | 1390 | |
| 92 | 1433 | |
| 91 | 1541 |
| Distinct | 33 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.65348432 |
| Minimum | 4.2 |
|---|---|
| Maximum | 12.9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 753.5 KiB |
Quantile statistics
| Minimum | 4.2 |
|---|---|
| 5-th percentile | 6.2 |
| Q1 | 8.8 |
| median | 11.6 |
| Q3 | 11.6 |
| 95-th percentile | 12.9 |
| Maximum | 12.9 |
| Range | 8.7 |
| Interquartile range (IQR) | 2.8 |
Descriptive statistics
| Standard deviation | 1.986738054 |
|---|---|
| Coefficient of variation (CV) | 0.1864871618 |
| Kurtosis | 0.9383413886 |
| Mean | 10.65348432 |
| Median Absolute Deviation (MAD) | 1.2 |
| Skewness | -1.19998749 |
| Sum | 1027336.8 |
| Variance | 3.947128094 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=33)
| Value | Count | Frequency (%) |
| 11.6 | 42792 | |
| 8.7 | 12120 | 12.6% |
| 12.9 | 10320 | 10.7% |
| 10.2 | 6048 | 6.3% |
| 12.8 | 4512 | 4.7% |
| 7.2 | 2400 | 2.5% |
| 8.8 | 2112 | 2.2% |
| 8.9 | 2064 | 2.1% |
| 11.9 | 1824 | 1.9% |
| 10.3 | 1656 | 1.7% |
| Other values (23) | 10584 | 11.0% |
| Value | Count | Frequency (%) |
| 4.2 | 936 | 1.0% |
| 4.3 | 840 | 0.9% |
| 5.7 | 1080 | |
| 5.8 | 528 | 0.5% |
| 5.9 | 72 | 0.1% |
| 6 | 408 | 0.4% |
| 6.1 | 576 | 0.6% |
| 6.2 | 840 | 0.9% |
| 7.2 | 2400 | |
| 7.3 | 120 | 0.1% |
| Value | Count | Frequency (%) |
| 12.9 | 10320 | 10.7% |
| 12.8 | 4512 | 4.7% |
| 12.5 | 720 | 0.7% |
| 12.3 | 528 | 0.5% |
| 11.9 | 1824 | 1.9% |
| 11.8 | 216 | 0.2% |
| 11.6 | 42792 | |
| 10.7 | 120 | 0.1% |
| 10.6 | 1032 | 1.1% |
| 10.5 | 48 | < 0.1% |
| Distinct | 31 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25.26966152 |
| Minimum | 13 |
|---|---|
| Maximum | 43 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 753.5 KiB |
Quantile statistics
| Minimum | 13 |
|---|---|
| 5-th percentile | 18 |
| Q1 | 22 |
| median | 25 |
| Q3 | 28 |
| 95-th percentile | 33 |
| Maximum | 43 |
| Range | 30 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 4.43081103 |
|---|---|
| Coefficient of variation (CV) | 0.1753411309 |
| Kurtosis | -0.2225966062 |
| Mean | 25.26966152 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.0183084165 |
| Sum | 2436804 |
| Variance | 19.63208638 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=31)
| Value | Count | Frequency (%) |
| 25 | 12633 | |
| 26 | 10512 | |
| 27 | 9033 | 9.4% |
| 28 | 7642 | 7.9% |
| 20 | 6507 | 6.7% |
| 29 | 5939 | 6.2% |
| 24 | 5707 | 5.9% |
| 19 | 4545 | 4.7% |
| 30 | 4340 | 4.5% |
| 21 | 4294 | 4.5% |
| Other values (21) | 25280 |
| Value | Count | Frequency (%) |
| 13 | 15 | < 0.1% |
| 14 | 114 | 0.1% |
| 15 | 442 | 0.5% |
| 16 | 1129 | 1.2% |
| 17 | 2050 | 2.1% |
| 18 | 2980 | |
| 19 | 4545 | |
| 20 | 6507 | |
| 21 | 4294 | |
| 22 | 3726 |
| Value | Count | Frequency (%) |
| 43 | 4 | < 0.1% |
| 42 | 11 | < 0.1% |
| 41 | 19 | < 0.1% |
| 40 | 33 | < 0.1% |
| 39 | 54 | 0.1% |
| 38 | 139 | 0.1% |
| 37 | 245 | 0.3% |
| 36 | 477 | 0.5% |
| 35 | 884 | |
| 34 | 1477 |
| Distinct | 87 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.07771901444 |
| Minimum | 0 |
|---|---|
| Maximum | 16.9 |
| Zeros | 84604 |
| Zeros (%) | 87.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 753.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0.4 |
| Maximum | 16.9 |
| Range | 16.9 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.3858652967 |
|---|---|
| Coefficient of variation (CV) | 4.96487635 |
| Kurtosis | 224.7864486 |
| Mean | 0.07771901444 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 11.19940752 |
| Sum | 7494.6 |
| Variance | 0.1488920272 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 84604 | |
| 0.1 | 3677 | 3.8% |
| 0.2 | 1750 | 1.8% |
| 0.3 | 1069 | 1.1% |
| 0.4 | 848 | 0.9% |
| 0.5 | 647 | 0.7% |
| 0.6 | 511 | 0.5% |
| 0.7 | 385 | 0.4% |
| 0.8 | 359 | 0.4% |
| 0.9 | 293 | 0.3% |
| Other values (77) | 2289 | 2.4% |
| Value | Count | Frequency (%) |
| 0 | 84604 | |
| 0.1 | 3677 | 3.8% |
| 0.2 | 1750 | 1.8% |
| 0.3 | 1069 | 1.1% |
| 0.4 | 848 | 0.9% |
| 0.5 | 647 | 0.7% |
| 0.6 | 511 | 0.5% |
| 0.7 | 385 | 0.4% |
| 0.8 | 359 | 0.4% |
| 0.9 | 293 | 0.3% |
| Value | Count | Frequency (%) |
| 16.9 | 1 | |
| 16.4 | 1 | |
| 15.1 | 1 | |
| 12.7 | 1 | |
| 12.3 | 1 | |
| 11.3 | 1 | |
| 11.1 | 1 | |
| 10.1 | 1 | |
| 9.1 | 1 | |
| 8.8 | 1 |
| Distinct | 22 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1010.554225 |
| Minimum | 1000 |
|---|---|
| Maximum | 1021 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 753.5 KiB |
Quantile statistics
| Minimum | 1000 |
|---|---|
| 5-th percentile | 1006 |
| Q1 | 1008 |
| median | 1010 |
| Q3 | 1013 |
| 95-th percentile | 1016 |
| Maximum | 1021 |
| Range | 21 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 3.187015913 |
|---|---|
| Coefficient of variation (CV) | 0.00315373073 |
| Kurtosis | -0.5051289699 |
| Mean | 1010.554225 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.1600586231 |
| Sum | 97449765 |
| Variance | 10.15707043 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=22)
| Value | Count | Frequency (%) |
| 1009 | 11225 | |
| 1010 | 11021 | |
| 1008 | 10181 | |
| 1011 | 10128 | |
| 1012 | 9042 | |
| 1013 | 8236 | |
| 1007 | 8033 | |
| 1014 | 7165 | |
| 1015 | 5696 | |
| 1006 | 5004 | |
| Other values (12) | 10701 |
| Value | Count | Frequency (%) |
| 1000 | 1 | < 0.1% |
| 1001 | 24 | < 0.1% |
| 1002 | 76 | 0.1% |
| 1003 | 345 | 0.4% |
| 1004 | 1170 | 1.2% |
| 1005 | 2700 | 2.8% |
| 1006 | 5004 | |
| 1007 | 8033 | |
| 1008 | 10181 | |
| 1009 | 11225 |
| Value | Count | Frequency (%) |
| 1021 | 11 | < 0.1% |
| 1020 | 101 | 0.1% |
| 1019 | 353 | 0.4% |
| 1018 | 683 | 0.7% |
| 1017 | 1662 | 1.7% |
| 1016 | 3575 | 3.7% |
| 1015 | 5696 | |
| 1014 | 7165 | |
| 1013 | 8236 | |
| 1012 | 9042 |
windspeedKmph
Real number (ℝ≥0)
| Distinct | 41 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.44893811 |
| Minimum | 0 |
|---|---|
| Maximum | 41 |
| Zeros | 16 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 753.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 8 |
| median | 12 |
| Q3 | 16 |
| 95-th percentile | 23 |
| Maximum | 41 |
| Range | 41 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 5.71676889 |
|---|---|
| Coefficient of variation (CV) | 0.4592173918 |
| Kurtosis | 0.8143122374 |
| Mean | 12.44893811 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 0.8047187624 |
| Sum | 1200476 |
| Variance | 32.68144654 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=41)
| Value | Count | Frequency (%) |
| 9 | 7990 | 8.3% |
| 12 | 7897 | 8.2% |
| 10 | 7694 | 8.0% |
| 11 | 6783 | 7.0% |
| 8 | 6509 | 6.7% |
| 13 | 6443 | 6.7% |
| 14 | 5762 | 6.0% |
| 15 | 5332 | 5.5% |
| 7 | 4838 | 5.0% |
| 6 | 4644 | 4.8% |
| Other values (31) | 32540 |
| Value | Count | Frequency (%) |
| 0 | 16 | < 0.1% |
| 1 | 294 | 0.3% |
| 2 | 681 | 0.7% |
| 3 | 1621 | 1.7% |
| 4 | 2312 | 2.4% |
| 5 | 3303 | |
| 6 | 4644 | |
| 7 | 4838 | |
| 8 | 6509 | |
| 9 | 7990 |
| Value | Count | Frequency (%) |
| 41 | 1 | < 0.1% |
| 39 | 9 | < 0.1% |
| 38 | 10 | < 0.1% |
| 37 | 21 | < 0.1% |
| 36 | 40 | < 0.1% |
| 35 | 71 | 0.1% |
| 34 | 52 | 0.1% |
| 33 | 94 | |
| 32 | 136 | |
| 31 | 216 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| date_time | maxtempC | mintempC | cloudcover | humidity | sunHour | HeatIndexC | precipMM | pressure | windspeedKmph | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2009-01-01 00:00:00 | 27 | 12 | 2 | 91 | 11.6 | 18 | 0.0 | 1014 | 8 |
| 1 | 2009-01-01 01:00:00 | 27 | 12 | 2 | 93 | 11.6 | 17 | 0.0 | 1014 | 6 |
| 2 | 2009-01-01 02:00:00 | 27 | 12 | 2 | 94 | 11.6 | 16 | 0.0 | 1014 | 4 |
| 3 | 2009-01-01 03:00:00 | 27 | 12 | 2 | 96 | 11.6 | 15 | 0.0 | 1014 | 3 |
| 4 | 2009-01-01 04:00:00 | 27 | 12 | 1 | 88 | 11.6 | 18 | 0.0 | 1015 | 3 |
| 5 | 2009-01-01 05:00:00 | 27 | 12 | 1 | 80 | 11.6 | 22 | 0.0 | 1016 | 3 |
| 6 | 2009-01-01 06:00:00 | 27 | 12 | 0 | 72 | 11.6 | 25 | 0.0 | 1016 | 4 |
| 7 | 2009-01-01 07:00:00 | 27 | 12 | 0 | 61 | 11.6 | 26 | 0.0 | 1016 | 5 |
| 8 | 2009-01-01 08:00:00 | 27 | 12 | 0 | 49 | 11.6 | 27 | 0.0 | 1015 | 6 |
| 9 | 2009-01-01 09:00:00 | 27 | 12 | 0 | 37 | 11.6 | 28 | 0.0 | 1014 | 7 |
Last rows
| date_time | maxtempC | mintempC | cloudcover | humidity | sunHour | HeatIndexC | precipMM | pressure | windspeedKmph | |
|---|---|---|---|---|---|---|---|---|---|---|
| 96422 | 2020-01-01 14:00:00 | 26 | 18 | 54 | 60 | 8.7 | 27 | 0.0 | 1013 | 17 |
| 96423 | 2020-01-01 15:00:00 | 26 | 18 | 63 | 61 | 8.7 | 27 | 0.0 | 1012 | 17 |
| 96424 | 2020-01-01 16:00:00 | 26 | 18 | 67 | 65 | 8.7 | 26 | 0.0 | 1013 | 16 |
| 96425 | 2020-01-01 17:00:00 | 26 | 18 | 71 | 68 | 8.7 | 26 | 0.2 | 1013 | 16 |
| 96426 | 2020-01-01 18:00:00 | 26 | 18 | 74 | 72 | 8.7 | 25 | 0.3 | 1014 | 15 |
| 96427 | 2020-01-01 19:00:00 | 26 | 18 | 74 | 76 | 8.7 | 25 | 0.1 | 1014 | 16 |
| 96428 | 2020-01-01 20:00:00 | 26 | 18 | 73 | 81 | 8.7 | 24 | 0.6 | 1015 | 16 |
| 96429 | 2020-01-01 21:00:00 | 26 | 18 | 72 | 86 | 8.7 | 23 | 0.8 | 1016 | 17 |
| 96430 | 2020-01-01 22:00:00 | 26 | 18 | 69 | 88 | 8.7 | 22 | 0.4 | 1016 | 16 |
| 96431 | 2020-01-01 23:00:00 | 26 | 18 | 66 | 89 | 8.7 | 21 | 0.5 | 1016 | 16 |